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Abstract 

The methodology of single-case experimental designs (SCED) has been expanding its efforts toward rigorous 
design tactics to address a variety of research questions related to intervention effectiveness. Effect size 
indicators appropriate to quantify the magnitude and the direction of interventions have been recommended 
and intensively studied for the major SCED design tactics, such as reversal designs, multiple-baseline designs 
across participants, and alternating treatment designs. In order to address complex and more sophisticated 
research questions, two or more different single-case design tactics can be merged (i.e., “combined SCEDs”). 
The two most common combined SCEDs are (a) a combination of a multiple-baseline design across 
participants with an embedded ABAB reversal design, and (b) a combination of a multiple-baseline design 
across participants with an embedded alternating treatment design. While these combined designs have the 
potential to address complex research questions and demonstrate functional relations, the development and 
use of proper effect size indicators lag behind and remain unexplored. Therefore, this study probes into the 
quantitative analysis of combined SCEDs using regression-based effect size estimates and two-level hierarch- 
ical linear modeling. This study is the first demonstration of effect size estimation for combined designs. 


Keywords: Combined designs; effect size; hierarchical linear modeling; regression models; single- 
case experimental design. 


Single-case experimental designs (SCEDs) are 
rigorous experimental designs that have been 
applied in a variety of fields (e.g., biomedical 
research, language and speech therapy, beha- 
vior modification, school psychology, counsel- 
ing psychology, physical therapy, special 
education, and neuropsychological rehabilita- 
tion) to evaluate the efficacy and effectiveness 
of interventions (Kennedy, 2005; Kratochwill 
et al., 2014; Moeyaert, Ferron, et al., 2014). In 
SCEDs, a case (one unit [e.g., participant], or 
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an aggregate unit such as a class) is measured 
repeatedly across time during conditions (e.g., 
baseline and intervention condition or multi- 
ple intervention conditions). Data from differ- 
ent conditions are compared to evaluate the 
efficacy or effectiveness of one or multiple 
interventions. The basic question examined 
using SCEDs is whether there is evidence for 
a functional relation between the systematic 
manipulation of an independent variable (i.e., 
the conditions) and its consistent effect on 
a dependent variable (i.e., the target behavior) 
(Kratochwill et al., 2010; Kratochwill & Levin, 
2014; J. Ledford et al., 2018). 

Valid and reliable structured visual ana- 
lysis techniques (J. Ferron & Jones, 2006; 
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Kratochwill et al., 2010) have been devel- 
oped for interpreting SCED results and are 
widespread. Visual analysis has a rich his- 
tory and is strongly embedded in the field of 
SCEDs. It is considered to be a valid 
approach for identifying “weak”, “moder- 
ate”, or “strong” evidence for a causal rela- 
tionship between an independent and 
dependent variables by evaluating data 
using six steps described by Kratochwill 
et al. (2010). Following the technical doc- 
umentation of the What Works Clearinghouse 
(WWC) Standards for Design and Analysis of 
SCEDs (Kratochwill et al., 2010), the field 
is now moving toward estimating effect size 
indicators to supplement and support the 
visual analysis results. Efforts have been 
made to develop effect size estimates for 
“single” SCEDs such as the alternating 
treatment design, multiple-baseline design, 
and ABAB reversal design (e.g., Lenz, 2013; 
Maggin et al., 2011; Manolov & Solanas, 
2013; Moeyaert, Ugille, Ferron, Beretvas, 
et al., 2014; Moeyaert, Ugille, Ferron, 
Onghena, et al., 2014; Parker, Vannest, & 
Davis, 2011; Parker et al., 2014; Shadish 
et al., 2008, 2014; Swaminathan et al., 
2010; Wolery et al., 2010). However, the 
formulation of these effect size indicators 
for “combined” SCEDs is not yet fully 
developed. This study is timely, especially 
given the potential of these types of designs 
to answer rich research questions and to 
make internally and externally more valid 
inferences about the efficacy or effective- 
ness of an intervention. 


Combined single-case designs 


Shadish and Sullivan (2011) conducted 
a review of SCED studies published in 
2008 to review their design and data char- 
acteristics. Their search resulted in 809 
unique SCED studies, 73.1% of which con- 
sisted of “single” designs: 54.3% were 
Multiple-Baseline Designs (MBD) across 


participants; 8.2% represented Withdrawal 
and Reversal Designs (WRD, such as ABAB 
reversal designs); 8.0% were Alternating 
Treatment Designs (ATDs); and 2.6% were 
Changing Criterion Designs (CC). The 
authors found that a proportion of SCEDs 
(26.9%) do not use a “single” design, but 
rather a design that combines characteris- 
tics of two or more “single” SCED designs — 
so-called “combined SCEDs” (J. Ledford & 
Gast, 2018). Specifically, the combination 
of MBD + WRD appeared to be the most 
popular one (12.0%), followed by the com- 
bination of MBD + ATD (9.9%). 

Combined or combination SCEDs (J. 
Ledford & Gast, 2018) offer three major 
advantages compared to single SCEDs. 
First, they allow assessment of multiple 
research questions. For example, Trottier 
et al. (2011) looked at the functional rela- 
tion between peer-tutoring interventions 
and the number of spontaneous appropriate 
communicative acts generated by students 
with autism spectrum disorder (ASD) as the 
main focus of their study. The use of 
a combined SCED let the researchers exam- 
ine whether normally developing peers 
could independently teach children with 
ASD to use speech-generating devices or 
whether the typically developing peers had 
to first be taught how to instruct the chil- 
dren with ASD. As a result, this combined 
design study allowed the researchers to 
evaluate two different interventions simul- 
taneously: (a) teaching typically developing 
peers to give timely prompts to children 
with ASD to use the device; and (b) letting 
typically developing peers teach children 
with ASD to use the device (Trottier et al., 
2011). Additionally, the two interventions 
were alternated for each child, and the 
interventions were staggered across partici- 
pants (7 = 2), resulting in an MBD + ATD 
combined design. 

Second, a combined SCED allows for 
more evaluations of the effectiveness of 


the treatment as more replications are pre- 
sent. For example, the MBD + WRD com- 
bined design allows for replication of 
a treatment effect after removing and rein- 
troducing the treatment within 
a participant as well as across participants, 
taking into account different start times 
for the treatment. In case of the MBD + 
ATD combined design, the replication of 
alternating treatments can be seen both 
within each participant and across partici- 
pants at different points in time. The repli- 
cation effects can be identified both within 
and across participants. Replication is 
a central theme in  SCED | studies 
(Kratochwill et al., 2010) because it 
enhances the external validity of the 
resulting conclusions. Indeed, there is 
additional documentation of the effect at 
more points in time and more replications 
within one case. 

Third, due to the dynamic nature of com- 
bined designs, they grant an opportunity to 
modify pure SCEDs by adding design ele- 
ments in the middle of the study. For 
instance, Kelley et al. (2002) initially used 
an MBD to investigate the effectiveness of 
competing reinforcement schedules on 
functional communication (Figure 1). 
However, the data demonstrated problems. 
The disruptive behaviors for two out of the 
three participants were not decreasing; as 
a result, the authors slightly changed the 
condition from Functional Communication 
Training (FCT) without extinction to FCT 
with extinction, ensuring treatment fidelity 
for all the other steps in the study. In this 
way, the introduction of the ABAB allowed 
the study to continue and provided an 
opportunity to address the core research 
question. 

The analysis of the majority of the com- 
bined design studies typically relies on visual 
analyses and non-overlap indices to identify 
and make inferences about the intervention 
effects (Chung & Cannella-Malone, 2010; 
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Jason & Frasure, 1979; Matson & Keyes, 
1990; Trottier et al., 2011). For example, 
Lindberg et al. (1999) used an MBD + WRD 
combined design study to evaluate the effects 
of manipulation and reinforcement on self- 
injurious behaviors of two participants, solely 
relying on visual analysis. Another combined 
SCED study, MBD + ATD (Trottier et al., 
2011), reported the results of the effective- 
ness of peer-tutoring on the use of speech- 
generating devices for students with autism 
in social game routines using visual analysis 
and the Percentage of Non-Overlapping Data 
index (PND; Schlosser et al., 2008; Scruggs 
et al., 1987)). Relying on visual analysis and 
non-overlap indices is unfortunate because 
the opportunity is lost to precisely address 
additional questions through quantitative 
summaries (e.g., What is the magnitude of 
the intervention effect? To what extent is 
the intervention immediately effective? To 
what extent does the intervention remain 
effective over time? Are all the participants 
benefiting equally from the intervention?). 
While visual analysis and non-overlap indices 
provide an initial indication of effectiveness 
of an intervention, effect size indices are 
needed to provide additional information 
through quantitative synthesis. Effect size 
indicators can be used to quantify the magni- 
tude of intervention effectiveness at multiple 
points in time both for each participant and 
across participants. In addition, effect size 
estimates are supplemented with a standard 
error that reflects precision for the individual 
estimate and which can be used as a weight 
for quantitative summaries or analyses (i.e., 
multilevel meta-analysis; Moeyaert, 2019). 
Therefore, in this article, we are breaking 
new ground by applying the effect size logic 
to quantify intervention effectiveness for 
combined SCEDs. The effect size estimates 
will provide a more comprehensive picture 
regarding intervention effects by taking into 
account the design complexity of combined 
SCEDs, and they can be used in meta- 


4 EBP ADVANCEMENT CORNER 


FCT without Extinction 


14 BL 


12 


1¢ 


14 


10 


-2 


nl a nl 
o,RP Fe DH O&O 


e 
ILE 


oon ke HO 


Responses per Minute (Aggression or Disruption) 
i) 


10 

9 

8 

7 

6 

5 

4 

3 

2 

1 

° Keen teen erent cA 
ad 21 41 61 


2 


25 


18 


11 


PSP 
SLES 


(sasuodsay UoNner1uNWWO) JUepUadapu)) anu, Jad sasuodsey 


Extinction and ° i} 
ing |B 


f 25 
=| 18 
ae 
+4 
MC enue 
81 101 121 141 L 3 


Figure 1. An example of modifying the multiple baseline design by adding a phase change reversal. Frequency of target 
behaviors for three participants. Adapted from “The Effects of Competing Reinforcement Schedules on the Acquisition of 
Functional Communication,” by M. E. Kelley, D. C. Lerman, and C. M. Van Camp, 2002, Journal of Applied Behavior 

Analysis, 35(1), p. 62. 


analyses to assess generalizability across 

interventions and outcome variables. 
Previous research has focused on the cod- 

ing schemes and synthesis of results for 


each of the “single” SCEDs, including the 
simple AB phase design, the MBD across 
participants, WRD (ABAB), and ATDs 
(Moeyaert, Ugille, Ferron, Onghena, et al., 


2014; Shadish, Kyse et al., 2013). 
Researchers have not investigated (1) cod- 
ing and effect size estimation for combined 
SCEDs, and (2) meta-analysis of studies 
involving combined SCEDs. Due to the 
lack of methodology to quantify combined 
SCEDs, these studies tend to be simplified 
or excluded from meta-analyses, which 
contributes to biased effect size estimates 
and/or publication bias (e.g., Kokina & 
Kern, 2010; Wang et al., 2013). Therefore, 
we focus on how to quantify treatment 
effects for combined designs. Thus, the pur- 
pose of this study is to illustrate effect size 
estimation for combined designs using real 
data. In particular, we will focus on the 
MBD + WRD combined designs (=45.97%) 
and the MBD + ATD combined designs 
(=37.91%) as they are the two most popu- 
lar classes of combined SCEDs: 83.38% of 
the combined SCEDs (Shadish & Sullivan, 
2011). 


METHOD 


We identified combined design studies and 
then randomly selected one MBD + WRD 
and one MBD + ATD study. Combined 
SCEDs were identified by examining primary 
studies from four meta-analyses of SCEDs 
(Heyvaert et al., 2014; Kokina & Kern, 2010; 
Moeyaert et al., 2019; Shogren et al., 2004) 
and 20 primary studies that evaluated reading 
fluency interventions. These meta-analyses 
and primary SCED studies were chosen 
because the first author had access to raw 
data. The meta-analysis of Heyvaert et al. 
(2014) included 59 studies of which 11 studies 
(i.e., 18.64%) were combined SCEDs. The 
review by Kokina and Kern (2010) consisted 
of 18 SCEDs of which only four (i.e., 22.22%) 
were combined SCEDs. The peer-tutoring 
meta-analysis by Moeyaert et al. (2019) 
included 65 studies and contained nine com- 
bined SCEDs (ie., 13.85%). The last meta- 
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analysis (Shogren et al., 2004) had 13 SCED 
studies and two of them (15.38%) were 
combined SCEDs. Finally, of the 20 primary 


studies that examined reading fluency 
interventions, seven (ie., 35%) were 
combined SCEDs. Thus, a_ substantial 


proportion of reviewed studies was combined 
SCEDs, a finding that is consistent with the 
review of Shadish and Sullivan (2011). The 
full list of the 33 combined design studies 
from the meta-analyses that we reviewed is 
available from the first author upon request. 
Of these combined designs, the combinations 
MBD + WRD (i.e., 58.82%, 20 studies) and 
MBD + ATD (i.e., 23.52%, eight studies) were 
the most popular. This also supports the 
results from the study of Shadish and 
Sullivan (2011) and our decision to focus on 
these two classes of combined SCEDs in this 
study. 

One study per combined SCED type was 
randomly selected from the set to demonstrate 
the coding of the design matrix and estimation 
of the effect sizes. The design matrix gives an 
overview of the overall data structure and 
includes all variables (e.g., participant identi- 
fier, the dependent variable, the independent 
variables) together with scores assigned to 
these variables. All variables needed to esti- 
mate the effect sizes of interest should be 
reflected in the design matrix. For more infor- 
mation about the design matrix for SCEDs, see 
Moeyaert, Ugille, Ferron, Beretvas et al. 
(2014). However, other studies from the selec- 
tion could also have been chosen. Raw data for 
the dependent variable in SCEDs are tradition- 
ally graphically displayed as can be seen in 
Figure 2 (MBD + WRD) and Figure 3 (MBD 
+ ATD). As a result, researchers can retrieve 
raw data from the graphical displays in pri- 
mary studies. We used WebPlotDigitizer 
(Rohatgi, 2011) to retrieve raw data. The raw 
data represent the measures of the dependent 
variable over time. The dependent variable 
(i.e., targeted behavior) together with other 
variables (i.e., phase and time indicators) that 
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Figure 2. An example of the mixed design: MBD + PCR. Percentage of intervals with problem behaviors for three 
participants. Adapted from “The Effects of Choice-making on the Problem Behaviors of High School Students with 
Intellectual Disabilities,” by S. Seybert, G. Dunlap, and J. Ferro, 1996, Journal of Behavior Education, 6 (1), p. 58. 


are needed to conduct the statistical analysis and will be discussed later. For more informa- 
are part of the design matrix. The design tion about the data retrieval process, see 
matrix needed for effect size estimation of the Moeyaert, Maggin, et al. (2016). The raw 
combined designs is displayed in Tables 1 and 4 data from Figures 2 and 3 can be found in the 
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Figure 3. An example of the Mixed Design: MBD + ATD. Percentage intervals with challenging behavior for three 
participants. Adapted from “The Effects of Presession Manipulations on Automatically Maintained Challenging Behavior 
and Task Responding,” by Y.-C. Chung, and H. |. Cannella-Malone, 2010, Behavior Modification, 34(6), p. 493. 


supplement to this article (together with the 
SAS codes that can be used for the analyses) to 


facilitate replication of the analyses demon- 
strated in this study, using the same data sets. 
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Table 1. Design matrix for Case 1 (i.e., Scott) — Seybert 
et al. (1996) 


Case Session Outcome A1B1 BIA2 A2B2 


65.92 
29.89 
55.7] 
33.46 
50.84 
33.82 
34.15 
27.39 
33.36 
23.35 
20.75 
44.32 
21.51 
60.35 
66.91 
32.76 
48.10 
16.19 
23.37 
20 22.43 
21 16.24 
22 20.29 
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RESULTS 


Effect sizes are used as a complement to 
visual analysis in primary studies and can 


be used for between-study comparison of 
treatment effects and for meta-analytic pur- 
poses. Visual analysis has been well docu- 
mented by Kratochwill et al. (2010), 
whereas the focus of the current study is 
on the quantitative summary of combined 
SCEDs. The analyses in the empirical illus- 
tration sections are performed using SAS 
software, Version 9.4 (© SAS Institute 
Inc.) SAS codes are available in the supple- 
ment to this article. 


Multiple-baseline design — Withdrawal or 
reversal design 


To demonstrate the effect size estimation for 
the first class of combined SCEDs, we 
selected the study of Seybert et al. (1996). 
Seybert et al. (1996) investigated the differ- 
ences in problem and on-task behaviors in 
choice and no-choice conditions of three 
independent participants with intellectual 
disabilities. In the choice condition, partici- 
pants were given a choice of the domestic 
task to do. In contrast, in the no-choice con- 
dition, participants were assigned to do 


Table 2. Results of ordinary least squares analysis and Empirical Bayes analysis per 
participant 


Case Parameter OLS Estimate (SE) 
Scott Bor 61.31 (6.90 
Bi —24.28 (9.34 
Bo, 20.73 (8.91 
Bar —40.01 (9.34 
Bob Boo 38.20 (4.99 
Bio —22.47 (8.39 
pe 1.31 (9.55) 
Maria Bos 16.53 (2.97 
Bis —12.82 (6.64 
Bos 26.99 (8.40 


Bas -10.90 (7.97 


Estimate 
(Standard error of prediction) 


57.74 (11.87 
—19.30 (-) 
18.02 (10.35 
~37.38 (15.50) 
36.37 (11.77 
—19.30 (-) 
2.11 (10.32) 
18.90 (11.77 
—19.30 (-) 
29.85 (10.44 
~10.98 (15.50) 
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Table 3. Results of two-level analysis across participants 


Parameter Estimate (SE) t Pp 
Fixed Effects 
Baseline level Al 8 37.67 (11.74) 3.21 .082 
Change in level Al — B] 6, —19.30 (4.59) —4.2) <.001 
Change in level B1 — A2 6, 16.66 (10.26) 1.62 227 
Change in level A2 — B2 63 —24.18 (15.76) —1.53 367 
Random Effects Estimate (SE) Zz Pp 
Baseline level Al Gi, 391.51 (406.93) 0.96 .168 
Change in level Al — B] a, 0 (/) / / 
Change in level B1 — A2 6, 236.77 (291.24) 0.81 .208 
Change in level A2 — B2 a. 414.59 (701.37) 0.59 277 
Within-case variance oa 207.75 (36.40) 5.71 <.0001 
Table 4. Design matrix for Case 1 — Data retrieved from Chung and Cannella-Malone 
(2010) 
Case Session Outcome Treatment, Treatment, 
] 0.27933 0 0 
2 29.88827 0 0 
3 39.38547 0 0 
4 24.86034 0 0 
5 22.90503 0 0 
6 19.55307 0 0 
7 23.18436 0 0 
8 46.64804 0 0 
9 0 0 
0 0.27933 0 ] 
] 0 0 
2 0 0 ] 
3 0 0 
4 ) 0 1 
5 0.27933 0 
6 0.27933 0 ] 
7 0.27933 0 ] 
8 0.27933 0 
a certain domestic task. The outcome vari- (Scott) to m = 29 (Maria). Seybert et al. 


able reflected the percentage of problem (1996) used the combination of the MBD + 
behaviors and task engagement in the choice WRD to investigate the effectiveness of 
versus no-choice conditions. The data were choice-making on problem _ behavior. 
recorded using the 15-s partial interval A graphical display is given in Figure 2. 
recording: that is, only the five last seconds Seybert et al. (1996) claimed that the MBD 
was recorded per each 15-s interval. Data + WRD allowed them to provide further 
points per participant ranged from 1 = 22 evidence for the changes in the treatment 
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phase as a result of manipulating the inde- 
pendent variable — choice versus no-choice 
conditions. The inter-rater observer percent 
agreement ranged from 81% to 99% for 
occurrence and nonoccurrence of problem 
behaviors. Seybert et al. (1996) analyzed 
the data using visual analysis techniques, 
and the results were reported as percentages 
of intervals with problem behaviors. This 
combined SCED has the potential to demon- 
strate a functional relation between the 
choice-making condition and problem beha- 
vior as the effectiveness of the treatment can 
be evaluated at three or more different 
points in time. In addition, most of the 
phases included at least five measurements 
(one choice and one no-choice condition for 
Maria included only four measurements). 
The MBD embedded in the combined design 
meets the WWC design standards as it 
includes at least three potential demonstra- 
tions of treatment effectiveness across at 
least three different points in time. The 
WRD embedded in the combined design 
meets basic replications standards for Scott 
and Maria whereas this is not the case for 
Bob. There appears to be a non-effect for the 
withdrawal of the treatment. In addition, 
the WRD for Bob does not meet the WWC 
design standards as there are only two 
potential demonstrations of treatment effec- 
tiveness. According to Gast et al. (2018) this 
prohibits the conclusion that a functional 
relation is present for Bob. Notwithstanding 
of this non-effect and lack of experimental 
control for Bob, effect size estimation for this 
combined design can still be meaningful. 
Researchers might be interested in quantify- 
ing the size of the effect, and this quantifica- 
tion can be used to confirm the results based 
on the visual analysis. This effect size esti- 
mate can be used afterward for meta-analy- 
tic purposes. We focused on estimating 
regression-based effect size estimates for the 


occurrence of problem behaviors in choice- 
making conditions for three participants 
with intellectual disabilities. The statistical 
model and empirical illustration are dis- 
cussed in the following sections. 


Statistical model Step 1: single-level 
analysis. The single-level analysis can also 
be called an individual analysis as it involves 
a case-by-case evaluation of treatment 
effectiveness. Here, we are interested in 
demonstrating the effectiveness of 
a treatment at different points in time 
within participants. In the _ simplest 
scenario, the results are an estimate of 
change in levels between baseline and 
treatment phases for each participant 
separately. In other words: “Is there evidence 
for change in level between adjacent phases?” In 
this particular scenario, the design matrix 
contains dummy-coded variables indicating 
the specific phase to which a measurement 
belongs (see Table 1). We chose the 
following notation to distinguish between 
the consecutive phases: Al and A2 
indicate, respectively, the _ first and 
the second baseline phase, and Bl and B2 
denote the first and the second treatment 
phase. For the ABAB phase design, three 
dummy variables, AIB1, B1A2, and A2B2 
are coded as suggested by Moeyaert, Ugille, 
Ferron, Beretvas, et al. (2014) and Shadish, 
Kyse, et al. (2013). AIJB1 = 1 for all the 


measurement occasions after the first 
baseline phase, BIA2 = 1 for all the 
measurement occasions after the first 


treatment phase and A2B2 equals 1 during 
the last treatment phase (see Table 1). In 
order to predict the outcome score at the 
ith measurement occasion, the following 
multiple regression equation can be used 
and parameters can be estimated using 
Ordinary Least Squares (i.e., OLS): 


¥; = Bo + B,A1B1; + ByB1A2; + B,A2B2; 
+ e; with e;~N(0, o2) (1) 


When all three dummy-coded variables equal 
zero (i.e., AlBI = B1A2 = A2B2 = 0), then the 
indicated phase is the first baseline phase (B,). 
Each dummy variable represents the change 
from an earlier to its adjacent phase. Thus, for 
example, B1A2 refers to the change in level 
from B1 to A2 (ie., difference in level between 
Treatment 1 and Baseline 2). An extension 
here could be to investigate whether there 
are changes in linear (Moeyaert, Usgille, 
Ferron, Beretvas, et al., 2014) or non-linear 
trends (Hembry et al., 2015) or changes in 
variance of scores between adjacent phases 
(Baek & Ferron, 2013). 


Statistical model Step 2: two-level 
analysis. The two-level analysis involves an 
aggregate estimate of the treatment 
effectiveness across participants. Here, we are 
investigating the replication of the treatment 
effect across participants (within the same 
study), in addition to the replication of the 
treatment effect within participants. As 
a consequence, more generalized conclusions 
can be made, which strengthens the external 
validity of the inferences. In addition, 
variability in effectiveness of the treatment 
between participants can be quantified. One 
way to perform this analysis is to conduct 
a two-level analysis, which takes the 
hierarchical nature of the data into account; 
namely, measurements are nested within each 
of multiple cases. 

The coefficients from the first level: Boj, 
B,;, Ba, and B3;, can be modeled as varying 
at the second (participant) level. By fitting 
this multilevel model, overall average 
changes in level from one phase to another 
can be obtained in addition to how indivi- 
dual participants deviate from that overall 
change. The level 1 and level 2 equations 
are presented in Equations (2) and (3): 
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Level 1: 


+ ej with ej~N(0, 02) 
with ej~N(0, 02) 


Level 2: 
Boj = O00 + Uoj Uoj 
B,; = 910 tu 
1 10 se Uy 
: with c 
Boj = O29 + Up Uj 
Bz; = 830 + U3) U3j 
2 
0 Ou, Oupu, Oupu, Ouyus 
2 
N 0 Ou, Uo 01, Ou, up Ou, U3 
) 2 
0 Ounuy Our 01, Oupus 
2 
0 Ous3uU9 Ousu; Ousuy Or, 


The first line in Equation (3) indicates 
that the baseline level for participant j is 
modeled as a function of an average base- 
line level, 090, plus a random deviation from 
this mean, Ug. The subsequent equations 
describe the average change in level 
between Al and B1 (810), change in level 
between Bl and A2 (69), and change in 
level between A2 and B2 (030)phases, 
respectively. The variability in baseline 
level (i.e., 07,) and variability in changes 
in levels (i.e., 0f,, 02, and o7,) are captured 
by estimating the 
matrix. 


variance/covariance 


Empirical illustration. We use the Seybert 
et al. (1996) study for the empirical 
illustration of the single-level (individual) 
and two-level (average) effect size estimates 
for the MBD + WRD design. Seybert et al. 
(1996) investigated the effects of choice- 
making on the problem behaviors of three 
high school students with intellectual 
disabilities. In this example, we are looking 
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only at the outcome variable of occurrence conditions (i.e., no-choice — denoted as Al 
and nonoccurrence of problem behaviors and A2 in Figure 4) are interrupted by 
within choice and no-choice conditions. The treatment conditions (i.e., choice — denoted 
start of the intervention was staggered across as B1 and B2 in Figure 4). Participant 2 (i.e., 
the three participants, and two _ baseline Bob) has no second treatment phase as the 
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80 4 no-choice choice i no-choice—j choice 
69 = 37.67 ; 6,=16.66 ; 
i i 
1 1 
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Figure 4. Estimated parameters for each participant across phases. Note: The lines indicate case-specific and study- 
specific estimates. 


problem behavior remained low when the 
treatment was removed (phase A2). The 
graphical presentation of the data is given 
in Figure 2. The coding of the design 
matrix for participant 1 (i.e., Scott) in 
accordance with the mathematical model 
presented in Equation (1) can be found in 
Table 1 (the same coding is applied for the 
other cases). The SAS code to run the 
analyses is available as a supplement to 
this article. 

The output of the single-level analysis is 
presented in Table 2, and the visual pre- 
sentation of the estimated parameters is 
provided in Figure 4. From the single- 
level analysis, we can conclude that there 
is a demonstration of treatment effective- 
ness at three different points in time for 
Case 1 (i.e., Scott). When the choice-mak- 
ing intervention is introduced, we see 
a_ significant drop in problem behavior 
[B1, = —24.28, t(25) = —2.60, p= .018 and 
B3, = —40.01, t(25) = —4.28, p = .032]. When 
the  choice-making intervention is 
removed, we see a significant increase in 
problem behavior  [8,, = 20.73, t(25) = 
2.33,p = .032;]. For Case 2 (i.e., Bob) and 
Case 3 (i.e., Maria), there was only one 
demonstration of significant treatment 
effectiveness [Case 2: B,) = —22.47, t(20) = 
—2.68, p=.015, and Case 3: B53 = 26.99, 
t(25) = 3.21p = .004]. According to the 
WWC design’ standards (Kratochwill 
et al., 2010), the choice-making interven- 
tion was only effective for Scott as three 
demonstrations of treatment effectiveness 
at three different points in time are 
required to demonstrate a causal relation- 
ship between the introduction of the treat- 
ment and the change in outcome score. 

The two-level analysis was conducted to 
estimate the overall baseline level and 
changes in level between subsequent 
phases across the three cases in addition 
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to between-case variability in these esti- 
mates. The two-level analysis enhances 
the generalizability of treatment effective- 
ness beyond the cases under investigation. 
For didactic purposes (allowing visual pre- 
sentation of the estimated coefficients, 
Figure 4), a small dataset with only three 
cases is used. In order to run a two-level 
analysis and obtain generalizable esti- 
mates, it is suggested to use a larger data- 
set, including more than three cases. The 
results indicate that the choice-making 
intervention succeeded in reducing the 
problem behavior and large effect size esti- 
mates were obtained for the change in 
level between Al and B1 and A2 and B2 
[910 = —19.30, t(66) = —4.21, p< .001; 039 
= —24.18,t(1) = —1.53, p = .367]. Howev- 
er, only one estimate (810) is statistically 
significant (p <.05). 

An additional advantage of using the 
two-level analysis is that the between- 
case variance in treatment effect estimates 
can be estimated. Most variability was 
found in the estimate of the between- 
case variance for the change in level 
between A2 and B2 (Table 3, random 
effects). The results of the single-level 
and two-level analyses are visually pre- 
sented in Figure 4. 

Another advantage of using the two-level 
analysis is that empirical Bayes estimates of 
the case-specific parameters can be obtained. 
The empirical Bayes estimate can be viewed as 
a fully Bayesian approach that uses informa- 
tion of the full dataset to build prior distribu- 
tions (Shadish, Rindskopf, et al., 2013). 
Therefore, the empirical Bayes estimates are 
shrunken toward the mean (the overall aver- 
age fixed effects). These case-specific estimates 
are improved estimates compared to the sin- 
gle-level ordinary least squares estimates 
because information from the entire dataset 
is used (in other words, the empirical Bayes 
estimate is “borrowing strength” from all 
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available study evidence). For an introduction 
to empirical Bayes estimates, see Casella 
(1985). Instead of running three separate sin- 
gle-level analyses, one two-level hierarchical 
linear model can be run, providing both the 
effect size estimates across cases and case-spe- 
cific estimates. The results of the case-specific 
estimates based on the empirical Bayes esti- 
mates are displayed in Table 2 and closely 
match the results of the single-level ordinary 
least squares analyses. 


Multiple-baseline design — Alternating 
treatment design 


In Alternating Treatment Designs (ATDs), 
two or more treatments (possibly following 
a baseline phase) are rapidly alternated 
(Barlow & Hayes, 1979; Barlow et al., 
2009), or treatment sessions are alternated 
with no treatment sessions. Most of the 
ATDs are characterized by a baseline phase 
and two or more treatments, which are 
alternated during the treatment phase. In 
this scenario, the researcher is interested in 
the differential effect between the two treat- 
ment effects (i.e., the relative effectiveness of 
two or more interventions; Horner & Odom, 
2014). Other ATDs are characterized by an 
alternation of two or more treatments, or 
with alternation of two or more treatments 
with baseline sessions. In this later scenario, 
a pure baseline comparison is not possible 
unless the alternation is proceeded or fol- 
lowed by a phase only including baseline 
measures (Zimmerman et al., 2019). If the 
baseline sessions are alternated with treat- 
ment comparisons from the beginning, it is 
unknown how the participants perform 
without being introduced to the treatment 
(which could be a confounding factor). In 
addition, multitreatment inference can 
occur as it can be the case that multiple 
treatments are effective because they are 
given in an alternated fashion (one treat- 
ment might strengthen the effectiveness of 


the other treatment and vice versa). 
Zimmerman et al. (2019) indicate that pos- 
sible multitreatment interference can be 
detected with the inclusion of an initial base- 
line and visual analysis that compares the 
initial baseline level to the baseline observa- 
tions that are part of the alternating 
sequence. Similarly, a phase for a specific 
treatment can be included so that the obser- 
vations within the treatment phase can be 
compared to the treatment observations that 
are part of the alternating sequence. 

To demonstrate a functional relation 
between the independent and dependent 
variables, the data from different treatments 
should not overlap. In addition, the ATD 
study should include at least four data 
points of comparison in each of the treat- 
ments and at least five repetitions of alter- 
nating sequence to meet the standards of 
What Works Clearinghouse (Horner & Odom, 
2014; Kratochwill et al., 2010). 

This combined SCED combines the 
unique strengths of ATDs with MBDs (i.e., 
external validity, making more generalized 
treatment effects). That is, the combination 
of ATDs with MBDs uses the rapid compar- 
ison of two or more conditions (ATDs) and 
the start of the intervention phase is stag- 
gered across participants (MBD). In this 
way, the combination of ATD + MBD 
allows identifying the treatment that has 
a larger effect with higher degrees of inter- 
nal and external validity of measurements. 
Another possibility of the ATDs is that 
researchers may choose to continue only 
the treatments with the strongest effects in 
the final phases of the study (Kratochwill 
et al., 2010). 


Statistical model Step 1: single-level 
analysis. Similar to the single-level (i.e., 
case-specific) analysis for the MBD + WRD, 
a case-by-case intervention effectiveness 
evaluation can be performed for MBD + 
ATD. More specifically, the following 


research question is of interest: “Is there 
a change in level for Treatment 1 and Treatment 
2, respectively?” The effect sizes of interest can 
be obtained by introducing dummy variables 
for each treatment. The dummy-coded 
variables, Treatment,,;s, indicate the treatment 
phase. For instance, Treatment,,; equals one if 
the score belongs to treatment phase m on 
moment i, zero otherwise. If all the 
Treatment,,S are zero, then the measurement 
occasion belongs to the baseline phase. For 
two treatments, the following regression 
equation can be used (using treatment 
indicators Treatment); and Treatment;). 


Y¥; = By + B, Treatment;; + B,Treatmenty; 
+ e; with e~N(0, 02) (4) 


Bo indicates the baseline level, B, refers to 
the change in level between the baseline 
and Treatment | and £, refers to the change 
in level between the _ baseline and 
Treatment 2. The difference between B, 
and B, refers to the differential effect (e.g., 
“Is one of the treatments relatively more effec- 
tive?”). Equation (4) can be extended by 
modeling linear or non-linear trends 
(Hembry et al., 2015; Moeyaert, Ugille, 
Ferron, Beretvas, et al., 2014), or adding 
more dummy variables in case more than 
two treatments are examined. 


Statistical model Step 2: two-level 
analysis. This step is similar to Step 2 
described for MBD + WRD design, where 
coefficients from the first level can be 
modeled as varying at the second level: 


Level 1: Yj = Bo + B,Treatment;; 
+ ByTreatmenty; 
+ ej with ey~N(0,02) (5) 


Level 2: 
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Boj = 900 + Uoj Uoj 
Bj = O19 + uy with uy; 
Boj = 820 + Ua uy 
2 (6) 
0 Or, Ougu;  Aupur 
2 
N~ 0], | Ouu Oy,  Puruy 


2 
0 Ouuo Ounuy oO, 


This two-level analysis allows for making 
more generalized conclusions as overall 
average estimates across cases are obtained 
(the @s in Equation (6)). As noted before, 
case-specific estimates are available by 
requesting the empirical Bayes estimates. 
By estimating the variance/covariance 
matrix, the between-case variance in base- 
line level (o7,) and treatment effect esti- 


mates (07, and o7,) can be obtained. 


EMPIRICAL ILLUSTRATION 


The study of Chung and Cannella-Malone 
(2010) will be used for the empirical 
demonstration. This study used an ATD 
that is characterized by a baseline phase 
followed by an alternating phase in which 
baseline and treatment sessions are alter- 
nated. In addition, the ATD is repeated 
across multiple independent participants, 
and the start of the randomization phase 
is staggered across the participants (MBD). 
The purpose of the Chung and Cannella- 
Malone study was to examine separate 
and combined effects of motivation opera- 
tions of three participants with multiple 
disabilities in four pre-session conditions: 
(1) attention, (2) response blocking, (3) 
attention with response blocking, and (4) 
non-interaction. The dependent variable 
was stereotypic behavior, which was mea- 
sured using the 10- partial interval 
recording. Inter-observer data were calcu- 
lated for pre-session (39% of data) and 
treatment (40% of data) conditions, with 
the agreement reaching 98% and 99%. 
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The graphical display of the data can be 


found in Figure 3 (i.e., copied from the 
original study) and Figure 5 (i.e., 


recreated graph, using the retrieved data 
obtained with WebPlotdigitizer; Rohatgi, 
2011). 
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Figure 5. Estimated parameters for the single-level analysis and two-level analysis. The line during the baseline indicates 
the overall average baseline level estimate; the lines during the intervention indicate the estimated challenging behavior 
during the pre-session access intervention and the challenging behavior during the no pre-session access intervention. 


For this empirical demonstration, we will 
analyze the problem behavior for the three 
participants of the study of Chung and 
Cannella-Malone (2010). During the treat- 
ment, participants did two tasks: Task A and 
Task B, which were individualized to the 
needs and skills of the participating stu- 
dents. Students did the tasks in two condi- 
tions as shown in Figure 3: (1) pre-session 
access condition that was identified in the 
functional analysis part of the study and (2) 
no pre-session access. Because of the indi- 
vidual needs in the Chung and Cannella- 
Malone (2010) study, the treatment phases 
are participant-specific. This is commonly 
the case using SCEDs as one of the 
strengths of this design is to adjust the treat- 
ment according to the participant's needs. 
As a consequence, the baseline versus treat- 
ment comparison for the three participants 
is not completely the same (i.e., Lilly: base- 
line -— 5 min blocking; Anna: baseline —- 
10 min alone and Kellie: baseline - 5 min 
blocking). Therefore, strictly speaking, no 
experimental conclusions can be drawn 
from this combined design (Ledford and 
Gast, 2018). However, the treatment phases 
can be treated as subcategories of the same 
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treatment and as a consequence it is still 
meaningful to investigate generalization of 
the effect across the three participants. In 
the original study, the data were visually 
analyzed, and the results were reported as 
percentages of intervals with problem beha- 
vior. Chung and Cannella-Malone (2010) 
reported that the intervention was success- 
ful for two out of the three participants, 
whose problem _ behaviors noticeably 
decreased. The results of the intervention 
for the third participant were contradictory 
(i.e., the intervention condition identified 
as successful in the previous experiment 
failed to decrease problem behaviors). 
Notwithstanding, the interventions were 
successful for only two out of the three 
participants, it is still worth estimating the 
size of the intervention effect to comple- 
ment this finding. The coding of the design 
matrix for Case 1 (i.e., Lilly) in accordance 
with the mathematical model presented in 
Equation (4) can be found in Table 4. The 
SAS codes to run the analyses are available 
as a supplement to this article. 

The output of the single-level analysis is 
presented in Table 5. From the case by case 
analysis, we can conclude that there is 


Table 5. Results of ordinary least squares analysis and Empirical Bayes analysis 
per participant 


Case Parameter 


Lily By 


Anna Bo 


Kellie Bo 


Estimate 

(Standard error 

Estimate (SE) of prediction) 

25.84 (3.34 26.69 (11.68 
—25.73 (5.39 —30.46 (1.07 
—25.67 (5.39 —21.60 (6.49 

48.40 (4.52 43.16 58 
—38.53 (5.48 —30.75 (1.07 
—20.36 (5.39 —15.11 (6.01 

62.10 (5.81 65.03 (11.56 
—25.93 (7.79 —30.44 (1.07 
—4.02 (7.79 —8.59 (6.03 
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Table 6. Results of two-level analysis across participants 


Parameter 


Fixed Effects 


Baseline level cn 
Change in level Treatment 1 6, 
Change in level Treatment 2 05 
Random Effects 
Baseline level one 
Change in level Treatment 1 6, 
Change in level Treatment 2 Gi, 
Within-case variance oe 


a demonstration of treatment effectiveness 
for both interventions at two different 
points in time for Case 1 (i.e., Lilly) and 
for Case 2 (ie., Anna) at the .05 signifi- 
cance level. When both pre-session and no 
pre-session access are introduced, we see 
a significant drop in problem behavior for 
Lilly [Case1 : B, = —25.73,t(15) = —4.77,p 
= .0002 and B, = —25.67, t(15) = —4.76, p 
= .0003], and Anna [Case2: B, = —38.53, t 
(41) = —7.03, p<.0001 and B, = —20.36, t(41) 
= —3.78,p = .0005]. For Kelly (Case 3), 
there was only one demonstration of treat- 
ment effectiveness [8, = —25.93, t(39) = 
— 3.33 p = .0019]. 

The two-level analysis was conducted to 
generalize treatment effectiveness beyond 
individual cases. Again, for didactic pur- 
poses, a small dataset with only three 
cases is used. In order to run a two-level 
analysis and obtain generalizable estimates, 
it is recommended to use a larger dataset. 
The results indicate that both the pre-ses- 
sion access and no pre-session access inter- 
ventions succeeded in reducing the problem 
behaviors as negative estimates were 
obtained for the change in level between 
the baseline and Treatment | and the base- 
line and Treatment 2 [81:9 = —30.55, t(61) = 
~7.44, p = .012; O99 = —15.10, t(61) = —2.41, 
p= .152]. However, only the estimate of 


Estimate (SE) t Pp 
44.96 (11.67) 3.85 .057 
—30.55 (4.10) —7.44 .012 
—15.10 (6.26) —2.41 .152 
Estimate (SE) Zz Pp 
381.27 (399.56) 0.95 17 
1.16 (43.95) 0.03 489 
66.37 (122.25) 0.54 293 
251.33 (36.64) 6.86 <.0001 


the effect of Treatment 1 is statistically sig- 
nificant (p < .05). As can be seen in Table 6, 
the between-case variance in the treatment 
effects was large for Treatment 2 [6;, = 
66.37, Z = 0.54, p =.293], and the within- 
case residual variance is statistically signifi- 
cant [62 = 251.33, Z = 6.86, p < .0001]. 

The visual presentation of the single-level 
analysis and two-level analysis is given in 
Figure 5. 

As mentioned earlier, an extra advantage 
of using the two-level model is that case- 
specific estimates are obtained in addition 
to the overall average estimates across 
cases. The results of the case-specific esti- 
mates based on the empirical Bayes esti- 
mates are displayed in Table 5 and closely 
resemble the results of the single-level 
analyses. 


DISCUSSION 


Previous research in the field of SCEDs 
solely focused on estimating intervention 
effectiveness using data from “single” 
SCEDs. This study expands on this and 
introduces an analysis technique suitable 
to estimate treatment effectiveness for 
more complex SCEDs, namely “combined 
SCEDs”. This study is the first study to 
demonstrate how applied researchers can 


use an extension of established methodol- 
ogy to come up with an effect size estimate 
appropriate for combined designs. The pro- 
posed technique is generic and not limited 
to combined designs. For instance, by 
excluding predictors in the two-level mod- 
els, the technique can be used to quantify 
treatment effects across single SCEDs. 
Combined SCEDs are combinations of sin- 
gle SCEDs, and are frequently used as they 
are more internally and externally valid 
and can answer richer research questions. 
The two most popular combined designs are 
discussed in detail, namely the MBD + 
WRD and MBD + ATD. For these combined 
designs, we discuss (a) the mathematical 
models appropriate for the quantitative 
analysis, (b) the coding of the design 
matrix, (c) the statistical software to per- 
form the analysis, (d) the interpretation of 
the output tables, and (e) the visual presen- 
tation of the obtained coefficients. We 
demonstrate the process using data from 
previously published studies. The purpose 
is to assist single-case researchers in draw- 
ing valid and reliable inferences regarding 
the treatment effectiveness for complex 
designs. 

The single- and two-level hierarchical lin- 
ear modeling (HLM) techniques are sug- 
gested. The two-level HLM is appropriate 
as both participant-specific and _ overall 
average  study-specific estimates are 
obtained simultaneously (instead of run- 
ning separate single-level analyses for each 
case), which leads to drawing more gener- 
alized inferences. Empirical Bayes estimates 
of the participant-specific treatment effects 
are more precisely estimated compared to 
the OLS (single-level) estimates, but they 
are biased toward the average effect. By 
ignoring the hierarchical structure of the 
data (i.e., measurements are nested within 
cases, and cases are nested within study), 
biased standard errors are obtained (the 
standard errors are too small due to 
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ignoring the dependency), and, conse- 
quently, the analysis is prone to Type 
I errors. The two-level HLM_ provides 
regression-based effect size estimates and 
their standard errors. Therefore, they can 
be used afterward for meta-analytic pur- 
poses. A third level can be added to the 
model, and overall average treatment effec- 
tiveness can be estimated across studies. In 
addition, the variability in treatment effec- 
tiveness between studies can be explored. If 
a large amount of variability is identified, 
moderators can be added to the model. 
Another advantage of summarizing treat- 
ment effects across studies is the increased 
power to identify true treatment effects. 


Limitations and future research directions 


The HLM model introduced in this study is 
the most basic model, which ignores, for 
instance, data trend and autocorrelation, 
and is only appropriate for continuous out- 
comes. In addition, use of conventional 
HLM requires assumptions about multivari- 
ate normality that need to be met in order 
to make valid inferences (Raudenbush & 
Bryk, 2002). This was beyond the scope of 
this study as the focus was on the logic of 
modeling combined design SCEDs, which is 
already a complexity. However, use of the 
HLM is flexible, and other complexities can 
be introduced into the model. For instance, 
in case a researcher is studying a target 
behavior or skill in which a trend is 
expected, the introduced models can be 
extended by including a time indicator vari- 
able in the treatment phase. This results in 
two effect size estimators of interest: (1) 
change in level of the dependent variable 
when introducing the treatment and (2) the 
trend during the treatment phase. Two- 
level hierarchical linear modeling including 
a linear time trend is discussed in detail in 
Moeyaert, Ugille, Ferron, Beretvas, et al. 
(2014). 
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Another complexity relates particularly 
to the MBD + ATD design. In ATDs, the 
effectiveness of two (or more) treatments 
is compared with a common baseline 
phase, which introduces dependency. The 
model can be further extended by exploring 
options to model this dependency (by, for 
instance, estimating the covariance or using 
a more complex estimation technique if 
more cases within a study are included, 
specifically robust variance estimation; 
Hedges et al., 2010). Last, when using 
HLM, caution needs to be exercised when 
interpreting the between-case variance esti- 
mates as severely biased estimates can be 
obtained (Moeyaert et al., 2013). The lim- 
itations discussed here are not specific to 
HLM of combined SCEDs, but for using 
HLM in general as an analysis technique 
for the quantitative integration of SCED 
data. 

In addition, the results of the two studies 
discussed in this article should be interpreted 
with caution because in both of them there 
was a lack of experimental control. In Seybert 
et al. (1996), the withdrawal and reversal 
design embedded in the combined design did 
not meet the basic replication standards for 
one of the participants. In addition, there was 
a non-effect for the withdrawal of the treat- 
ment for that same_ participant. As 
a consequence, to meet the WWC design stan- 
dards to demonstrate experimental control, 
there is an additional basic replication needed 
for one of the participants of the Seybert et al. 
(1996) study. Similarly, in Chung and 
Cannella-Malone (2010), the treatment to 
reduce problem behaviors was effective for 
two out of three participants. In addition, the 
effectiveness of the treatment was investigated 
across slightly different treatment phases. In 
order to meet the WWC design standards, the 
treatment phases across the participants 
should be identical and there should be three 
demonstrations of the effectiveness of the 
treatment at three different points in time. 


Effect size estimation for these combined 
designs is still informative as it quantifies the 
magnitude of treatment effect. This quantifi- 
cation provides an overall summary of the 
study findings (and variability between parti- 
cipants in treatment effectiveness) and can be 
used for meta-analysis purposes afterward. 
However, we encourage applied SCED 
researchers designing combined SCEDs that 
meet the WWC design standards for experi- 
mental control. In order to demonstrate our 
methodology, we were limited to published 
combined designs. The examples included 
are typical for the field and are solely used to 
demonstrate the analysis technique. 

In terms of future research directions, the 
suggested models can be extended by adding 
case characteristics (gender, age, race, etc.) to 
investigate their moderating effect on the 
treatment effectiveness. However, recent 
research related to power indicates that at 
least 12 cases are needed, or 7 cases in combi- 
nation with at least 40 measurement occa- 
sions, to be able to include case 
characteristics in the analyses (Moeyaert 
et al., 2017). This, of course, depends on the 
particular predictors and the value of the true 
treatment effect. Simulation studies can be 
performed in order to investigate the power 
for a particular set of design conditions. Again, 
this is beyond the scope of this paper. Other 
ways of coding the design matrix are also pos- 
sible depending on the specific research ques- 
tions and structure of the data being analyzed. 

To further enhance the internal validity, 
single-case researchers might consider 
introducing randomization when develop- 
ing the combined SCED design. As dis- 
cussed in depth by J. R. Ledford et al. 
(2018), several forms of randomization can 
be incorporated in the design. First, the 
start and the retrieval of the intervention 
can be randomized. In this scenario, it is 
recommended that the randomization does 
not start until baseline stability is estab- 
lished. Second, the order of the conditions 


can be randomized, which is typically done 
in ATDs. Unrestricted randomization is not 
recommended to avoid conditions not 
representing ATDs (i.e., all baseline condi- 
tions could be chosen first) or to avoid that 
a certain randomized pattern is consistently 
chosen (i.e., treatment 1 is always adminis- 
tered after treatment 2). A third randomiza- 
tion form is the random assignment of 
participants to intervention start points. 
This is relevant for multiple-baseline 
designs across participants. Incorporating 
randomization in the design allows for use 
of randomization tests to make conclusions 
related to treatment effectiveness. The 
advantage of such tests is that the sampling 
distribution is built based upon the rando- 
mization patterns and as a consequence, no 
parametric assumptions are made and 
needed (for more details about randomiza- 
tion, see J. M. Ferron & Levin, 2014; 
Heyvaert et al., 2017). Inclusion of rando- 
mization has the potential to reduce the risk 
of biased effect size estimates. 

In order to increase the external validity of 
treatment effectiveness and contribute to evi- 
dence-based decisions in research, practice 
and policy, multiple SCED studies can be sum- 
marized. Previous research demonstrates how 
the multilevel meta-analytic framework can 
be used to combine single SCEDs (Moeyaert, 
2018; Moeyaert, Ugille, Ferron, Onghena, 
et al., 2014). Therefore, future research is 
needed to demonstrate how pure and com- 
bined SCEDs can be combined using the mul- 
tilevel meta-analytic approach. Similarly, 
a following-up study can be conducted to 
evaluate the consequences of ignoring the 
complex nature of combined designs. 


CONCLUSIONS 


This study is the first study introducing 
and demonstrating a promising methodo- 
logical framework for effect size estimation 
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for combined SCEDs. The two-level hier- 
archical model is recommended as it has 
the possibility to include variables to 
account for the combined design complex- 
ity. In this study, the logic of modeling the 
combined SCED study is introduced, 
empirical illustrations are given, analysis 
output is discussed and SAS code is sup- 
plemented. Single-case researchers are 
given the tools (and are encouraged) to 
modify and/or further extend the models. 
The proposed method of coding and esti- 
mating effect sizes for combined SCEDs 
can be a useful technique to inform 
researchers and practitioners about the 
effectiveness of interventions. 
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